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Abstract We reanalyze high resolution data from the New York Stock Exchange and find a monotonic 
(but not power law) variation of the mean value per trade, the mean number of trades per minute and 
the mean trading activity with company capitalization. We show that the second moment of the traded 
value distribution is finite. Consequently, the Hurst exponents for the corresponding time series can be 
calculated. These are, however, non-universal: The persistence grows with larger capitalization and this 
results in a logarithmically increasing Hurst exponent. A similar trend is displayed by intertrade time 
intervals. Finally, we demonstrate that the distribution of the intertrade times is better described by a 
multiscaling ansatz than by simple gap scaling. 

PACS. 89.75.-k Complex systems - 89. 75. Da Systems obeying scaling laws - 05.40.-a Fluctuation phe- 
nomena, random processes, noise, and Brownian motion - 89.65.Gh Economics; econophysics, financial 
markets, business and management 



Understanding the financial market as a self-adaptive, 
strongly interacting system is a real interdisciplinary chal- 
lenge, where physicists strongly hope to make essential 
contributions [ij, |2|, |3j • The enthusiasm is understandable 
as the breakthrough of the early 70 's in statistical physics 
taught us how to handle strongly interacting systems with 
a large number of degrees of freedom. The unbroken de- 
velopment of this and related disciplines brought up sev- 
eral concepts and models like (fractal and multifractal) 
scaling, frustrated disordered systems, or far from equilib- 
rium phenomena and we have obtained very efficient tools 
to treat them. Many of us are convinced, that these and 
similar ideas and techniques will be helpful to understand 
the mechanisms of the economy. In factjthere have been 
quite successful attempts along this line 0,11, |(| An ubiq- 
uitous aspect of strongly interacting systems is the lack 
of finite scales. The best understood examples are second 
order equilibrium phase transitions where renormalization 
group theory provides a general explanation of scaling and 
universality Q- It seems that some features of the stock 
market can indeed be captured by these concepts: For ex- 
ample, the so called inverse cube law of the distribution of 
logarithmic returns shows a quite convincing data collapse 
for different companies with a good fit to an algebraically 
decaying tail [3, ||| • 

Studies in econophysics concentrate on the possible 
analogies, although there are important differences be- 
tween physical and financial systems. This is, of course, 
a trivial statement - it is enough to refer to the above- 
mentioned self-adaptivity, to the possibility of influencing 
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the system by its characterization or to the intrinsic non- 
stationarity of economic processes. Here we would like to 
emphasize the discrepancy in the levels of description. In 
the case of a physical system undergoing a second order 
phase transition, it is natural to assume scaling on pro- 
found theoretical grounds and the (experimental or theo- 
retical) determination of, e.g., the critical exponents is a 
fully justified undertaking. There is no similar theoretical 
basis for the financial market whatsoever, therefore in this 
case the assumption of power laws should be considered 
only as one possible way of fitting fat tailed distributions. 
Also, the reference to universality should not be plausible 
as the robustness of qualitative features - like the fat tail 
of the distributions - is a much weaker property. There- 
fore, e.g., averaging distributions over companies with very 
different capitalization is questionable. While we fully ac- 
knowledge the process of understanding based on analo- 
gies as an important method of scientific progress, we em- 
phasize that special care has to be taken in cases where 
the theoretical support is sparse [ljj. Motivated by this, 
the aim of the present paper is to carry out a careful anal- 
ysis of the high resolution data of the New York Stock 
Exchange with special emphasis on the effects caused by 
the size of the companies. 

The paper is organized as follows. After the introduc- 
tion of notations in Section ^ Section [21 presents the re- 
sults on the capitalization dependence of various measures 
of trading activity. In Section[3]we show that the distribu- 
tion of the traded values is not Levy stable as suggested 
previously j 1 11 ] . Consequently, the Hurst exponents of the 
related time series exist, these are analyzed in Section 
01 We point out, that correlations in trading activity are 



2 



Zoltan Eisler, Janos Kertesz: Size matters: some stylized facts of the stock market revisited 



strongly non-universal with respect to company size, and 
that the Hurst exponent of the traded value depends loga- 
rithmically on the mean traded value per minute. Section 
deals with the time intervals between trades and we give 
indications, that their distribution is better described by 
a multiscaling ansatz than by gap scaling proposed earlier 
[l2T |. Finally, Section H3 concludes. 



1 Notations and data 

For a given time window size At, let the total traded value 
(activity, flow) of the ith stock at time t be 

/<*(*) = E w> (!) 

n,ti(n)e[t,t+At\ 

where ti (n) is the time when the n-th transaction of the 
i-th stock takes place. This corresponds to the coarse- 
graining of the individual events, or the so-called tick-by- 
tick data. Latter is denoted by Vi(n), this is the value 
traded in transaction n and it is a product of the price p 
and the traded volume of stocks V, 

V i {n)=p i {n)V i (n). (2) 

Price usually changes only a little from trade to trade, 
while the number of stocks traded in consecutive deals 
varies heavily. Thus, the fluctuations and the statistical 
properties of the traded value fit) are basically governed 
by those of V. Price only serves as a conversion factor 
to US dollars, that makes the comparison of stocks possi- 
ble. This way, one also automatically corrects the data for 
stock splits. The statistical properties (normalized distri- 
bution, correlations, etc.) arc otherwise practically indis- 
tinguishable between traded volume and traded value. 

As the source of empirical data, we used the TAQ 
database [R| which records all transactions of the New 
York Stock Exchange in the years 1993 — 2003. 

Finally, we note that throughout the paper we use 10- 
base logarithms. 



2 Capitalization and basic measures of 
trading activity 

Many previous studies of trading focus on the stocks of 
large companies. These certainly have the appealing prop- 
erty that price and returns are well defined even on short 
time scales due to the high frequency of trading. For infre- 
quently traded stocks, there are extended periods without 
transactions, and thus prices and returns are undefined. In 
contrast, other quantities regarding the activity of trad- 
ing, such as traded value/volume or the number of trades 
can be defined, even for those stocks where they are zero 
for most of the time. 

In this section we extend the study of Zumbach 
which concerned companies of the top two orders of mag- 
nitude in capitalization at the London Stock Exchange. 



Instead, we analyze the 3347 stocks 1 that were traded con- 
tinuously at NYSE for the year 2000. This gives us a range 
of approximately 10 6 . . . 6 • 10 11 USD in capitalization. 

Following Ref. [3], we quantify the dependence of trad- 
ing activity on company capitalization C\. Mean value per 
trade (Vi), mean number of trades per minute (Ni) and 
mean activity (traded value per minute) (fi) are plotted 
versus capitalization in Fig.^ Ref. [TjJ found that all three 
quantities have power law dependence on Ci, however, this 
simple ansatz does not seem to work for our extended 
range of stocks. While mean trading activity can be ap- 
proximated as (fi) oc (70- 98±0 - 06 to an acceptable quality, 
neither (V) nor (N) can be fitted by a single power law 
in the whole range of capitalization. Nevertheless, there 
is - not surprisingly - a monotonic dependence: higher 
capitalized stocks are traded more intensively. 

One can gain further insight from Fig. [2 which shows, 
that for the largest 1600 stocks 

(Vi) oc (N t f (3) 

with (3 = 0.57±0.09. The estimate based on the results 
of Zumbach for the stocks in London's FTSE-100, is 
P « 1. Similar results were recently obtained for NASDAQ 

m 

For the smaller stocks there is no clear tendency. This 
effect can be interpreted as follows. As we move to stocks 
with smaller and smaller capitalization, the average trans- 
action size (V) cannot decrease indefinitely. Transaction 
costs must impose a minimal number/value of stocks in a 
single transaction that can still be exchanged profitably. 
This minimal size is observed as the constant regime for 
small (N). On the other hand, once a stock is exchanged 
more frequently (the crossover happens at about (N) = 
0.05 trades/min), it is no more traded in this "minimal" 
unit. With the growing speed of trading, trades tend to 
"stick together", it is possible to exchange larger pack- 
ages. This increase is clear, but not dramatic, it is up to 
one order of magnitude. Although increasin g pa cka ge s izes 
reduce transaction costs, price impact [Tfit fl7t ITslllflll in- 
creases, possibly decreasing profits and thus limiting pack- 
age sizes. The interplay of these two effects has a role in 
the formation of relationship J3J. 



3 Traded value distributions revisited 

The statistical properties of the trading volume of stocks 
has previously been investigated in Ref. [llj . That work 
finds that the cumulative distribution of traded volume in 
At =15 minute windows has a power-law tail with a tail 
exponent A — 1.7 ± 0.1. This is the so called inverse half 
cube law. Formally, this corresponds to 

p 4i (/)«r (H1) , (4) 

1 Note that many minor stocks do not represent actual com- 
panies, they are only, e.g., preferred class stocks of a larger 
enterprise. 
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capitalization (USD) capitalization (USD) capitalization (USD) 



Figure 1. Capitalization dependence of certain measures of trading activity in the year 2000. The graphs are monotonically 
increasing and are (piecewise) well approximated by power laws as indicated. All three tendencies curve downward for large 
capitalizations, (a) Mean value per trade (V) in USD. The fitted slope corresponds to the regime 5 • 10 7 < C < 7.5 • 10 10 m 
USD. (b) Mean number of trades per minute (N). The slope on the left is from a fit to C < 4.5 ■ 10 9 USD, while the one on 
the right is for C > 4.5 • 10 9 USD. (c) Mean trading activity (exchanged value per minute) (/) in USD. The plots include 3347 
stocks that were continuously available at NYSE during 2000. 



where ¥^ t is the probability density function of traded 
volume (value) on a time scale At. 

Ever since, great effort was devoted to explain this ex- 
ponent in terms of the inverse cube law of stock returns 
[1,13 0]- However, the exact distribution and the possi- 
ble exponents are still much debated [l8L l20| , and it has 
been shown that the shape of such a distribution depends 
systematically on the capitalization of the company [2l| . 

The estimation of the tail exponent is a delicate mat- 
ter. Following the methodology of Ref. [ijj - and for the 
same 1994 — 1995 period of data - we repeated these mea- 
surements. Our results for the At = 15 min distribution 
are shown in Fig. for three majors stocks. The tails of 
these distributions can be fitted by a power law over an 
order of magnitude, for the top 5 — 10% of the events. The 
exponent A we find, is significantly higher than 1.7, it is 
around 2.2 for these examples. 

For systematic calculations of A, there is a range of 
mathematical tools available. We used three variants of 
Hill's method [22Ll23| to estimate the tail exponent, details 
can be found in Appendices ^] and [B] All three have a 
common parameter: the number k of largest events that 
belong to the tail. The statistical weight associated with 
the tail events is p = k/L, where L is the total length of 
our time series. From Fig.^one can see, that p w 5—10% is 
the proper choice as a threshold for the asymptotic regime. 

For the two-year period 1994 — 1995 and separately 
for the single year 2000, we took the 1000 stocks with the 
highest total traded value in the TAQ database. We de- 
trended their trading activity by the well known [/-shaped 
intraday pattern (see, e.g., Ref. pij). Then, we calculated 
the distribution of A over these stocks. The median and 
the width of this distribution (characterized by the half 
distance of the 25% and 75% quantiles) is shown in Ta- 
bles ^ an d for various time windows At. 

The choice p = 0.06 in Hill's method provides results 
in line with Ref. [Tj. For At = 15 min time windows, 
one finds A = 1.71 ±0.20 for the period 1994-1995. How- 
ever, other estimates are significantly higher, A > 2. More- 
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Figure 2. Plot of mean value per trade {V) versus mean num- 
ber of trades per minute (iV) for the year 2000 of NYSE. For 
smaller stocks there is no clear tendency. For the top ~ 1600 
companies ((N) > 0.05 trades/min), however, there is scaling 
with an exponent /3 = 0.57 ± 0.08. Note: The plot includes 
3347 stocks that were continuously available at NYSE during 
2000. Note: The first few points correspond to stocks that are 
traded less than daily. These typically do not represent indi- 
vidual companies and might be traded according to different 
rules. However, unlike prices or returns, V, N and / still remain 
well-defined quantities for such stocks. 



over, two estimators show a strong tendency of increasing 
A with increasing time windows. Monte Carlo simulations 
on surrogate datasets show, that this is beyond what could 
be explained by decreasing sample size. It is well known, 
that for A < 2 the distribution would have to converge 
to the corresponding Levy distribution when At — > oo. 
The measured A's should also be independent of At. On 
the other hand, for A > 2, the At — > oo limit distribution 
is a Gaussian. Accordingly, for finite samples, the mea- 
sured effective value of A increases with At. This system- 
atic dependence makes us conclude that there is a strong 
indication for the existence of the second moment. 
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f/<f> 

Figure 3. Distributions of traded value in At — 15 min time 
windows, divided by the mean. The plot displays three example 
stocks for the period 1994 — 1995. The numbers show some 
upper quantiles of the distribution (probability of values higher 
than indicated by the corresponding dashed line). The dashed 
and solid diagonal lines represent power-laws with exponents 
corresponding to A = 1.7 and 2.2, respectively. 

One must keep in mind, that all three methods assume 
that the variable is asymptotically distributed as J2J and 
none of them proves it. If this does not hold, then the 
estimates of exponents are only a parametric characteri- 
zation of the unknown functional form, nevertheless, they 
do suggest that the second moments exist. If the distri- 
bution is indeed of the limiting form fl}. then although 
for short time windows (At < 60 min) there is a fraction 
of stocks whose estimate gives A < 2, even those display 
A > 2 for larger At. 

Based on these results we conclude that the second 
moments of the distribution must exist for any At, there- 
fore the calculation of the Hurst exponent for the related 
time series is meaningful. Similar qualitative features were 
found for the years 2001 and 2002 



4 Non-universality of correlations in traded 
value time series 

Scaling methods [2(| FJt], HU have long been used to char- 
acterize a wide variety of time series, including stock prices 
and trading volumes 00 In particular, the Hurst expo- 
nent H(i) is usually calculated. For the traded value time 
series f^it) of stock i, it is defined as 

af(At) = ((/*(t) - (f^it))) 2 ) * (5) 

where the average is taken over the time variable t. As 
discussed in Sec. [31 the variance on the left hand side exists 
for any stock or time scale At. 

Ref. 11] finds strong correlations in a/ //" it) with H as 
0.83. Their analysis comprises the 1000 largest companies 
in the period 1994 — 1995 and they use At > 1 day except 
for some very frequently traded stocks. 



We extend these measurements to all 2647 stocks that 
were continuously traded in the period 2000 — 2002. The 
time series display a crossover from a lower to a higher 
value of H (i) around the time scale of one day (for an ex- 
ample, see the inset of Fig.rjJ. A similar effect was reported 
for intertrade times of large companies . Intraday cor- 
relations are not meaningful for some of the smallest com- 
panies as their shares are often not exchanged for several 
days. Nevertheless, for any choice of time windows, one 
recovers a tendency: With the change of average traded 
value (fi), there is a clear logarithmic trend in the Hurst 
exponent, especially above the daily scale: 

fr(t) = £r(< = i) + 7 iog</ < ) J (6) 

where normalization is so that (fi=i) = 1. Measurement 
results and values of 7 are given in Fig. Calculations for 
the periods 1994-1995 and 1998-1999 show qualitatively 
similar properties. On the grounds of a new type of scaling 
law , this effect can be predicted analytically ^| . Here 
we only focus on the description of the phenomenon. 

Trading activity of very small stocks shows nearly no 
persistence. Even for At > 1 day, H w 0.5. This changes 
as one moves to larger and larger companies. Their trad- 
ing can be more correlated in the regime At > 1 day, up 
to H « 0.9. This is a clear sign of non-universality. The 
very nature of trading differs for different company sizes 
and statistics such as "distributions of Hurst exponents" 
is meaningless. No typical value exists, the trend is sys- 
tematic and continuous. As Hurst exponents are closely 
related to the multifractal spectra 26, 29] of /, those can- 
not be universal either. This raises doubts about an "av- 
erage multifractal spectrum" as calculated in, e.g., Ref. 

m 

Systematic dependence of the exponent of the power 
spectrum of the number of trades on capitalization was 
previously reported in Ref. [3l|], based on the study of 
88 stocks. This quantity is closely related to the Hurst 
exponent for the time series of the number of trades per 
unit time (see Ref. ^3)- Direct analysis finds a strong 
dependence of the Hurst exponent of N on (A), but no 
such clear logarithmic trend as Eq. J§J 25l |. 

5 Multiscaling distribution of intertrade times 

Finally, we analyzed the intertrade interval series Tj(n = 
1 . . . Ni— 1), defined as the time spacings between the n'th 
and n + l'th trade [32]. Ni is the total number of trades 
for stock i during the period under study. 

Previously, Ref. [13] used 30 stocks from the TAQ 
database for the period 1993 — 1996 and proposed that 
the distribution of Tj scales with the mean (Ti) as 

nT,(T}) = -±-F(T/(T)), (7) 

and the universal scaling function F is well modeled by a 
Weibull distribution of the form 
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At 


Hill's method (p = 0.06) 


Shifted Hill's A 


Shifted Hill's <p 


Fraga Alves (p = 0.1) 


1 min 


1.43 ±0.09 


2.15 ±0.15 


3.0 


1.98 ±0.25 


5 min 


1.56 ±0.13 


2.29 ±0.25 


2.8 


2.04 ±0.25 


15 min 


1.71 ±0.20 


2.55 ± 0.35 


2.8 


2.1 ±0.3 


60 min 


2.06 ±0.30 


2.85 ± 0.45 


1.8 


2.1 ±0.4 


120 min 


2.3 ±0.4 


3.15 ±0.70 


1.6 


2.1 ±0.4 


390 min 


2.7 ±0.6 


3.7 ±0.9 


1.2 


no estimate 



Table 1. Median of the tail exponents of traded value calculated by three methods for 1994— 1995. The width of the distributions 
is given with the half distance of the 25% and 75% quantiles. 



At 


Hill's method (p = 0.06) 


Shifted Hill's A 


Shifted Hill's tp 


Fraga Alves (p = 0.1) 


1 min 


1.63 ±0.13 


2.40 ± 0.23 


2.6 


2.16 ±0.25 


5 min 


1.91 ±0.25 


2.8 ±0.5 


2.4 


2.30 ±0.35 


15 min 


2.15 ±0.40 


3.1 ±0.6 


2.0 


2.35 ± 0.40 


60 min 


2.6 ±0.5 


3.45 ±0.8 


1.2 


2.2 ±0.4 


120 min 


2.8 ±0.6 


3.8 ±1.1 


1.2 


no estimate 


390 min 


3.2 ± 1.0 


5.1 ±0.8 


1.6 


no estimate 



Table 2. Median of the tail exponents traded value calculated by three methods for 2000. The width of the distributions is 
given with the half distance of the 25% and 75% quantiles. 
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Figure 4. The Hurst exponent of traded value / shows loga- 
rithmic dependence on the average traded value per minute (/) . 
For intraday fluctuations (O), correlations in (/) are weak, H « 
0.5 - 0.6, the fitted slope is -y(At < 250min) = 0.016 ± 0.001. 
Beyond the daily scale (■) the effect increases: the small- 
est stocks show almost no correlation (H ~ 0.5), while large 
ones display strong persistence (H ~ 0.9). The fitted slope is 
j(At > 630min) = 0.063 ± 0.002. The inset shows the two 
regimes of correlation strength for the single stock Wal-Mart 
(WMT) on a log-log plot of a(At) versus At. The slopes cor- 
responding to Hurst exponents are 0.65 and 0.8. 

where X w 0.94 and S w 0.72 for all the 30 stocks, with 
some statistical deviations. 

We analyzed the data by including a large number of 
stocks with very different capitalizations. First it has to 
be noted that the mean intertrade interval has decreased 
drastically over the years. In this sense the stock market 
cannot be considered stationary for periods much longer 
than one year. We analyze the two year period 1994 — 1995 
(part of that used in Ref. an( i separately the single 



year 2000. We use all stocks in the TAQ database with 
(T) < 10 5 sec, a total of 3924 and 4044 stocks, respec- 
tively. 

In order to check the validity of the gap scaling for- 
mula, we divided the stocks into two groups 2 with respect 
to (T). Then, we generated the distribution of T j (T) for 
the groups, a comparison for the year 2000 is shown in Fig- 
ure This already raises doubts about the generality of 
Eq. 0: The tails of the distribution seem to possess more 
weight for the group with small (T) (blue chips). The di- 
rect visual comparison of these distributions is, however, 
not always a reliable method to evaluate universality. In- 
stead, we take a less arbitrary, indirect approach. 

The consequence of the universal distribution Q would 
be that the moments of T should show gap scaling: The 
difference between the exponents of the g-th and q + 1-th 
moments is independent of q. [U : 

{T?) = C( q )(T i )- T M, (9) 

with a scaling function 3 —r(q) = q. 

Instead, we find a systematic dependence of — r on q, 
see Fig. El for several examples of fitting and Fig. for all 
results. There is good fit to a power law of type © for 4 
orders of magnitude in (T) with non-trivial exponents. 

The intuitive meaning of — r(q ;§> 1) < q is simple: 
Intertrade times of larger (more frequently traded) stocks 
exhibit larger relative fluctuations. In line with our ob- 
servation from Figure |SJ this difference must come from 
the tail of the distribution, as the deviation becomes more 
pronounced for higher moments. 

2 The groups were constructed to have an approximately 
equal total number of trades. Small (T) (top 246 stocks): 6.48 
sec < (T) < 47.8 sec (other 3797 stocks), large (T): 47.8 sec 
< (T) < 10 5 sec. 

We keep the negative sign to conform with usual conven- 
tions. 
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Figure 5. The distribution of T j (T) in the year 2000 for two 
groups of stocks with different mean intertrade times (T) . The 
group with the most frequently traded stocks (blue chips) has 
a considerably greater weight for waiting times. This implies 
that the distribution P(T, (T)) may not be universal. 




4 8 12 

q 

Figure 7. Scaling exponents for the moments of intertrade in- 
terval distributions defined in Eq. ®. The values — r(q) = q 
would imply a universal distribution that is independent of 
stock. The fact that —r(q)/q < 1, shows less frequently traded 
stocks display relatively lower variations in their trading dy- 
namics. For large q, the effect increases monotonically with q. 
This suggests a difference between small and large stocks in the 
tail of the distribution, which corresponds to longer periods of 
inactivity. 




10 2 10 3 10 4 10 5 
<T> (sec) 



Figure 6. Scaling of integer moments of T, q — 
1,2,4,6,8,12,16 (increasing from bottom to top). The plot 
shows (T q ) 1 ^ q I (T), the slopes correspond to — r(q)/q— 1. If the 
normalized distribution of T were universal, the points would 
align on horizontal lines. Note: The points were shifted verti- 
cally for better visibility. Only 400 points are shown per mo- 
ment, the sample period was 1994 — 1995. 



The absence of simple universal scaling raises the ques- 
tion of the capitalization dependence of the Hurst expo- 
nent for the time series T^, defined analogously to Eq. J5J 
as 



(10) 



The data show a crossover, similar to that for the 
traded value /, from a lower to a higher value of Hx{i) 
when the window size is approximately the daily mean 
number of trades (for an example, see the inset of Fig. 

For the restricted set studied in Ref. ff^l , the value 
Ht ~ 0.94 ± 0.05 was suggested for window sizes above 
the crossover. 

Much similarly to the case of traded value Hurst expo- 
nents analyzed in Section 01 the inclusion of more stocks 4 
reveals the underlying systematic non-universality. Again, 
less frequently traded stocks appear to have weaker auto- 
correlations as Ht decreases monotonically with growing 
(T). One can fit an approximate logarithmic law 5,6 to 
characterize the trend: 

H T = H T ((T) = l)+ lT \og(T), (11) 

where "f T = -0.10 ± 0.02 for the period 1994 - 1995 (see 
Fig.© and 7t = -0.08 ± 0.02 for the year 2000 [H|. 

In their recent preprint, Yuen and Ivanov |35j| indepen- 
dently show a tendency similar to Eq. I|ll|) for intertrade 
times of NYSE and NASDAQ in a different set of stocks. 



4 For a reliable calculation of Hurst exponents, we had to 
discard those stocks that had less than (N) < 10~ 3 trades/min 
for 1994 - 1995 and (N) < 2 • 10 -3 trades/min for 2000. This 
filtering leaves 3519 and 3775 stocks, respectively. 

As intertrade intervals are closely related to the number of 
trades per minute N(t), it is not surprising to find the similar 
tendency for that quantity |3l). 

6 Note that for window sizes smaller than the daily mean 
number of trades, intertrade times are only weakly correlated 
and the Hurst exponent is nearly independent of (T). This is 
analogous to what was seen for traded value records in Sec. |1] 
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<T> (sec) 

Figure 8. Hurst exponents of Ti for time windows greater than 
1 day, plotted versus the mean intertrade time {Ti}. Stocks that 
are traded less frequently, show markedly weaker persistence 
of T for time scales longer than 1 day. The dotted horizon- 
tal line serves as a reference. We used stocks with (T) < 10 5 
sec, the sample period was 1994 — 1995. The inset shows the 
two regimes of correlation strength for the single stock General 
Electric (GE) on a log-log plot of cr(iV) versus N. The slopes 
corresponding to Hurst exponents are 0.6 and 0.89. 

6 Conclusions 

In this paper we revisited some "stylized facts" of stock 
market data and found in several ways alterations from 
earlier conclusions. The main difference in our approach 
was - besides the comparative application of extrapolation 
techniques - the extension of the range of capitalization 
of the studied firms. This enabled us to investigate the 
dependence of the trading characteristics on capitaliza- 
tion itself. In fact, in many cases we found fundamental 
dependence on this parameter. 

We have shown that trading activity (/), the number 
of trades per minute (N) and the mean size of transac- 
tions (V) display non-trivial, but monotonic dependence 
on company capitalization. 

We have given evidence that the distribution of traded 
value in fixed time windows is not Levy stable. If a power 
law is fitted to the tail of the distribution, a careful analy- 
sis yields to an exponent A, which is - even for short time 
windows - in most cases greater than 2, and then increases 
with increasing time window indicating the existence of 
the second moment of the distribution. Consequently, the 
Hurst exponent H for its variance can be defined and it 
depends on the mean trading activity (/) as 

fr(t) = fr(* = i) + 7io g (/ i ). 

The mean transaction size can be fitted to a power- 
law dependence on the trading frequency for moderate to 
large companies. 

The distribution of the waiting times between trades 
is better described multiscaling than by gap scaling. It 
is characterized by an increase in both correlations and 



relative fluctuations with growing trading frequency (i.e. 
increasing capitalization) . 

Our findings indicate that special care must be taken 
when concepts like scaling and universality are applied to 
financial processes. The modeling of the market should 
be extended to the capitalization dependence of the char- 
acteristic quantities and this seems a real challenge at 
present. 
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A The estimation of tail exponents A 

In the following, for every measurement we give the me- 
dian estimates of A for the 1000 stocks with highest traded 
value during the investigated period. The error bars show 
the half distance between the 25% and 75% quantiles of 
A. 



A.l Hill's estimator 

Hill's estimator is a statistically consistent method to 
estimate the tail exponent A from random samples taken 
from a distribution that asymptotically has the power- 
law form |TJJ. The procedure first sorts the sample f(t = 
1 . . . L) in decreasing order. We are going to denote this 
series by f[t], so that /[l] > /[2] > /[3] > .... Then, 
one defines the tail of the distribution by setting an arbi- 
trary number k of points to be included in the estimation 
procedure. The estimate of the inverse tail exponent is 



A- 1 (fc) = 



1 



fc-i 



(fc-1) 



-log/[fc], 



(12) 



given that k — > oo and p ~ k/ L — > 0. If the sampled 
distribution is of the form (T4J, then by increasing k, the 
estimator converges rapidly to the actual value of A -1 . 
However, in the case of traded value data, this turns out 
not to be the case. 

The inset of Fig. Efa) - a so called Hill plot - shows, 
that there is a systematic dependence of A on p and no 
convergence is observed. With the inclusion of less tail 
events, the exponent increases sharply, beyond the A = 2 
threshold for Levy stability. Further evidence for the lack 
of Levy stability is that on increasing the time scale At, 
the estimated tail exponents also increase further as shown 
in Fig. Efa). 

This type of behavior is not new to mathematical statis- 
tics (see, e.g., Ref. HH). It is possible, that the distribution 
decays faster than a power law and thus no finite A exists. 
Alternatively, the power law may not be centered around 
zero, but instead it can be of the form 



At 



(/) oc (/ + f a ) 



-(A+l) 



(13) 



In this latter case, there is a finite A, but as the sample size 
T is usually too small, the estimator displays the above 



Zoltan Eisler, Janos Kertesz: Size matters: some stylized facts of the stock market revisited 



9 



bias. One can either try to approximate the value of /o 
and shift the data accordingly, so that Hill's estimator 
converges properly, or try to find another estimator that 
is insensitive to this shifting constant. 

We have tried both approaches and they yielded qual- 
itatively similar results. 



A. 2 Shifted Hill's estimator 

One can apply Hill's estimator to the points f[t= 1 . . . L] + 
<p (f) , where ip is a constant parameter and look for a 
value, where the estimator A(fc) becomes independent 7 of 
fe, i.e., Hill's estimator truly finds a power-law decay that 
is now consistent with Eq. (|13[1 . For an example, see Fig. 
EUa). This happens, when <p (/) = /q. How this shift by 
ip (/} affects the Hill plots is shown in Fig. EJc) for the 
case of At = 15 min. One finds, that in this case ip 2.8 
gives reasonable results, while A = 2.55 ±0.35. One can re- 
peat the procedure for various time scales At. The median 
Hill plots are shown in Fig.EJd), while X(At) and <p{At) 
are given in Tabled Again, one finds a significant increase 
of the tail exponent with growing At. This underlines our 
previous expectation that traded value distributions are 
not Levy stable and thus have a finite variance. 



A. 3 Fraga Alves estimator 

A more sophisticated approach to estimate tail exponents 
of distributions of the type 1)13(1 . is a recent variant of Hill's 
method, proposed by Fraga Alves [2jj- The algorithm is 
described in detail in Appendix [5] and its estimates of A 
are - in an exact mathematical sense - independent of the 
shift /o present in the density function, unlike those of the 
original Hill's estimator 112(1. 

We applied the estimator to the same dataset, the Hill 
plots for At = 1, 15, 60 min are shown in Fig. OJd). What 
one finds is a very different behavior from the shifted Hill's 
estimator. The estimate of A increases with growings, i.e., 
the more points included. This is due to that the Fraga 
Alves estimator converges much slower than Hill's estima- 
tor, and - as Fig. OJd) and Monte Carlo simulations on 
surrogate datasets indicate - it converges from below. On 
the other hand, setting the threshold as high as p = 0.1 
may include events that no more belong to the power law 
regime, which also results in a reduced, effective expo- 
nent due to the shape of the distribution, shown in Fig. [3 
Consequently, this method provides a lower estimate of A. 
Still, the calculated values are mostly above 2. Finally, one 
must note that for At > 120 min, the number of points 
was inadequate to provide any proper estimate at all. 



7 More precisely, we increased ip from by increments of 0.2 
and looked for X(k, ip) ~ \{<p). The method is very sensitive to 
the proper choice of ip. For high values of At, there is a low 
number of data points, and the estimates of A may be very 
noisy. In this case we chose ip, where the estimate of A is lower. 



B The algorithm of the Fraga Alves estimator 

Ref. p3| describes a method to approximate the parameter 
A from a sample of a random variable that is asymptoti- 
cally distributed as 



P4t(/)oc(/ + / ) 



-(A+l) 



First, one sorts the sample f(t = 1 . . . L) in decreasing 
order. We denote this series by f[t], so that /[l] > /[2] > 
/[3] > .... Then, the procedure consists of the five steps 
formulated below: 

1. k* = 2fc 2/3 
2. 

3. 

k = cl /{2X ~ 1{k "" k)+1) k a 
{l + \-Hkl,k)f 



where 



and 



Co = 



a — 



2\-i(k*,k) 

2A~ 1 (fcS,fc) 
2A- 1 (fc*,fc) + l' 



5. Finally, the estimate of the inverse tail exponent is 
given by 



A- 1 (fc ,fc) = A- 1 (fc ,fc)- 



'A-^fco.fc) 
2fc 



A (fco,fe) converges to the inverse tail exponent, if 
L — ► oo, k/L — > and k$/k — > 0. 
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Figure 9. (a) Hill' s estimates of A for different sizes of the time window with the tail probability set as p — 0.06. The monotonic 
trend indicates that the distribution is not be Levy stable. The inset shows, that for At = 15 min the effective tail exponent A 
depends monotonically on the choice for tail probability p. Thus, Hill's estimates are unreliable, because they depend strongly 
on an arbitrary parameter, (b) Dependence of the Hill plots for At = 15 min on the shifting constant <p. The values of ip from 
bottom to top: (n), 1 (A), 2.8 (•, optimal shift), 3.0 (O). Typical error bars are given on the right, darker gray indicates the 
regimes where they overlap, (c) Hill plots of the optimally shifted Hill's estimators for various time windows. The values of At 
from bottom to top: 1 min (■), 5 min (•), 15 min (a), 60 min (T), 120 min (♦), 390 min (*)• One finds A > 2 and the strong 
increasing tendency in A with At implies that the distribution is not Levy stable, (d) Hill plots of the Fraga Alves estimator 
for three time window sizes At: 1 min (O), 15 min (■), 60 min (□). The method gives a lower estimate of A ~ 2. 



